Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia
نویسندگان
چکیده
We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the fine-grained classification of instances in Wikipedia and a wellstructured top-level taxonomy from WordNet.
منابع مشابه
Distinguishing between Instances and Classes in the Wikipedia Taxonomy
This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has...
متن کاملUsing Goi-Taikei as an Upper Ontology to Build a Large-Scale Japanese Ontology from Wikipedia
We present a novel method for building a large-scale Japanese ontology from Wikipedia using one of the largest Japanese thesauri, Nihongo Goi-Taikei (referred to hereafter as “Goi-Taikei”) as an upper ontology. First, The leaf categories in the Goi-Taikei hierarchy are semi-automatically aligned with semantically equivalent Wikipedia categories. Then, their subcategories are created automatical...
متن کاملDeriving a Large-Scale Taxonomy from Wikipedia
We take the category system inWikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc...
متن کاملWikiTaxonomy: A Large Scale Knowledge Resource
We present a taxonomy automatically generated from the system of categories in Wikipedia. Categories in the resource are identified as either classes or instances and included in a large subsumption, i.e. isa, hierarchy. The taxonomy is made available in RDFS format to the research community, e.g. for direct use within AI applications or to bootstrap the process of manual ontology creation.
متن کاملThe importance of cross-lingual information for matching Wikipedia with the Cyc ontology
In this paper we try to answer the question how cross-lingual evidence may improve matching between di erent classi cation schemas. We concentrate speci cally on the task of mapping between Wikipedia categories and Cyc terms as well as the classi cation of Wikipedia articles to the Cyc taxonomy and show how this process may be improved by consuming the evidence that is available in di erent edi...
متن کامل